Group Members: Michael Woo, Thomas Slawinski, Akaash Patel, Rahul Patel

Imports
Exploring the data

Note

Number of labels per images
Things to keep in mind regarding channels
Here we seperate the images based on the ID and color
Viewing the images
Generating Stacked Images (Feature Engineering)
Generating directories and saving images with respect to the labels associated to the images
Further Feature Engineering
Size of images and batch size variables
Training Data Generator
Validation Data Generator
Training Data
Validation data
Subsetting Data
Create functions to create multi-label AUC_ROC curve
Naive Bayes Classifer
Plot classification report results (Naive Bayes)
Random Forest Classifer
Plot classification report results (Random Forest)

XGBoost Classifer

Reducing subset specifically for XGBoost to speed up model training
Plot classification report results (XGBoost)

Cellular Masking (further potential feature engineering)

Idea: Mask images by color to generate more easily readble images to provide to the models as input

Didn't perform due to limited computational resources and time when attempting to use a realistic sample size (3 stacked images per classification took almost an hour)

Function to convert .tif.gz to .png and put it in the same folder
Function to convert .tif.gz to .png and put it in the same folder
All label names in the public HPA and their corresponding index
Function to convert label name to index
Reading in TSV file
Only have to install this once
Segmenting the cells

Visualization of images stacked with masking